Keyword [Hippocampal] [Cortex] [Hebbian Memory] [Hippocampal-Wntorhinal System]

Whittington J, Muller T, Mark S, et al. Generalisation of structural knowledge in the hippocampal-entorhinal system[C]//Advances in Neural Information Processing Systems. 2018: 8484-8495.

1. Overview

1.1. Motivation

a central problem to understanding intelligence is the concept of generalisation
hippocampal-entorhinal system is known to be important for generalisation

In this paper, it proposes that to generalise structural knowledge, the representations of the structure of the world (how entities in the word relate to each other) need to be separated from representations of the entities themselves

ANN embedded with hierarchy and fast Hebbian memory
representations can effectively utilise memories
shows a preserved relationship between entorhinal gird and hippocampal place cells across environments
explicitly represented structure can be combined with sensory information in a conjunctive code unique to each environment. Thus sensory observations are fit with prior learned structural knowledge, leading to generalisation

1.2. Contribution

1.2.1. Neuroscience

find an interpretation of grid cells, place cells and remapping that offers a mechanistic understanding for the hippocampal involvement in generalisation of knowledge across domains
results suggest spatial representations found in the brain may be an instance of a more general coding mechanism organising knowledge across multiple domains

1.2.2. Machine Learning

build a network where fast Hebbian learning interacts with slow statistical learning
- this allow to learn representations whereby memories are not only stored in a Hebbian network for one-shot retrieval within domain, but also benefit from statistical knowledge that is shared across domains - allowing zero shot inference

1.3. Generally

implement its proposal in an ANN tasked with predicting sensory observations when walking on 2D graph worlds, where each vertex is associated with a sensory experience
To make accurate predictions, the agent should learn the underlying hidden strucutre of graphs
unsupervised learning. providing the network with only sensory observations and actions
place cells form a conjunctive representation between sensory identity and strucuture. This conjunctive representation forms a Hebbian memory, which bridges structure and identity, allowing the same structural code to be reused across environments
combine fast Hebbian learning of episodic memories, with gradient descent which slowly learns to extract statistics of these memories

1.4. Details

propose that the statistics of memories in hippocampus are extracted by cortex
propose that future hippocampal representations/memories are constrained to be consistent with the learned structural knowledge
choose memory storage and addressing to be computationally biologically plausible (rather than using other types of differentiable memory more akin to RAM) , as well as using hierarchical processing. This enables our model to discover representations that are useful for both navigation and addressing memories

1.5. In Neuroscience

generalisation of statistical structure (the relationships between objects in the world) imbues an agent with the ability to fit things/concepts together that share the same statistical structure, but differ in the particularities
hippocampus is known to be important for generalisation, memory, problems of causality, inferential reasoning, transitive reasoning, conceptual knowledge representation, one-shot imagination and navigation
in spatial navigation there is a good undertanding of neuronal representations in both hippocampus (place cell, landmark cells) and medial entorhinal cortex (grid cell, border cell, object vector cell)
place cells and grid cells have had a radical impact in neuroscience, leading to the 2014 Noble Prize in Physiology and Medicine
place and grid cells are similar in that they have a stable firing pattern for specific regions of space
place cells only fire in a single (or couple) localtion in a given environment whereas grid cells fire in a regular lattice pattern tiling the space. These cells cemented the idea of a ‘cognitive map‘, where an animal holds an internal representation of the space it navigates
other entorhinal cell types (border, object vector cells) appear to have disparate roles in coding space
remapping (traditionally thought to be random) → the space cell code is different for two structurally identical environments

2. Model

consider an agent passively moving on a 2D graph, observing a non-unique sensory stimulus (an image) on each vertex
if the agent wishes to undertand its environment then it should maximise its model’s probability of observing each stimulus
trained on many environments sharing the same strucure (2D graph)
one approach to this problem: have an abstract representation of space encoding relative locations, and then place a memory of what stimulus was observed at that (relative) location
since the agent understands where it is in space, this allows for accurate state predictions to previously visited nodes even if the agent has never travelled along that particular edge before (Figure 2c)
grid cell as base for constructing abstract representation of space
place cell representations for the formation of fast episodic memories
posit that this (place cells forms a conjunction) is done hierarchically across spatial frequencies, such that the higher frequency statistics can be repeatedly used across space.This reduces the number of weights that need to be learnt
grid cells to be recurrent through time
view the hippocampal-entorhinal system as one that performs inference

2.1. Model Summary

the model is a neural network and learns strucuture across tasks
optimise end-to-end via backpropagation through time
the central (attractor) network employs Hebbian learning to rapidly remember the conjunction
a generavie temporal model learns how to use the Hebbian memory most efficiently given the common statistics of transitions across worlds

2.2. Notation

a layer of activations with vector notations

Element
s. index for sensory
j. index for phases

2.3. Generative Model

g. grid cell
p. place cell
M. agent’s memory
a. action
Θ. parameters of generative model
x. one-hot vector where each of its n_s elements represent a sensory identity
g&p (learned instead of hard-coded). come in different frequencies (hierarchies) indexed by superscript f

2.3.1. Grid Cells

to predict where we will be, we can transition from our current location based on our heading (path integration, Fig 2c)
f. functions specific to the distribution in question
connections in D_a are from low frequency to the same or higher frequency only (or alternatively only within frequency).
separate into hierarchical scales so that high frequency statistics can be reused across lower frequency statistics, i.e. learning and knowledge is reused across space

2.3.2. Place Cells

for retrieving memories
stored memories are extracted via an attractor network (Fig 2b) using as input - i.e. grid cells act as an index for memory extraction

2.3.3. Data

categorical distribution

sum over phases
f_c*. MLP
choose f* to be 0 (only include highest frequency)

2.4. Inference Network

posterior is intractable so approximated to
phi. parameters of the inference network
learn Θ and phi by maximising the ELBO with the VAE fra

2.4.1. Place Cells

2.4.2. Grid Cells

2.5. Hebbian Memories

when enter a new environment, memory is reset to be empty (zeros)
memories of place cell representations are stored in Hebbian weights between place cells (M_t)
allow rapid learning when entering a new environment

p^. place cells generated from inferred grid cells
λ&η. the rate of forgetting and remembering
connections from high to low frequencies are set to zero, so that memories are retrieved hierarchically

best results. when two separate matrices were used
x(^). retrieved memory with the sensorium as input to the attra

2.5.1. Retrieval

attractor network
τ. iteration of the network
α. decay term
h_0. input, from grid cells or sensorium (depending on for generative or inference), dimensions scaled appropriately
output. retrieved memory (place cell code)

2.6. Model Implication

believe that using more biologically realistic computational mechanisms (Hebbian Memory instead of LSTM) will facilitate further incorporation of neuroscience-inspired phenomena, such as successor representations or replay

2.7. Details

although presented a Bayesian formulation, best resuts were obtained by only using the means of the above distributions

first item. cross entropy loss
other item. squared error loss between inferred and generated variables
5 different frequencies. n_f as [10, 10, 8, 6, 6]
environment square. [8, 10, 12]
agent changes to a new environment after 2000~5000 steps
a_t. up, down, left, right, stay still
time truncated to 25 steps
two separate memory matrices. use additional memory module in grid cell inference
Typically after 200-300 environments, the agent has fully learned the strucuture (~50000 gradient updates)
remove a_t from the generative model so that the generative model can more easily capture the true underlying transition statistics
place-like representations are learned in the attractor
grid-like representations are learned in the generative temporal model

2.8. graphic

For the gird cells of a certain frequency (Inference)

(1, 30) downsample to (1, 10)
combined with (1, 10) sensory cell to get (1, 100) cells

(Constrain) connections from high to low frequency are set to 0

3. Experiments

For Fig 4b middle and right

grid representations that are shifted versions of each other, as in the brain
the separation into different phases (same frequency) means that two conjunctive place cells that respond to the same stimulus, will not necessarily be active simultaneously - each cell will only be active when their corresponding grid phase is active
thus one can uniquely code for the same stimulus in many different locations
Across two environments, a given stimulus may occur at the same grid phase but at a different location